✨ switch to Huber Regression with L2 normalization for sliding window method#44
✨ switch to Huber Regression with L2 normalization for sliding window method#44
Conversation
There was a problem hiding this comment.
Pull request overview
This pull request switches the sliding window method from Theil-Sen regression to HuberRegressor for approximately 2x performance improvement while maintaining similar accuracy. Additionally, it migrates dependency management from requirements.txt to direct specification in pyproject.toml, which is a more modern and streamlined approach.
Changes:
- Replaced scipy's
theilslopeswith scikit-learn'sHuberRegressorin the sliding window fitting algorithm - Migrated dependencies from requirements.txt to pyproject.toml's dependencies field
- Added scikit-learn as a new dependency
Reviewed changes
Copilot reviewed 2 out of 3 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
| src/growthcurves/non_parametric.py | Replaced Theil-Sen estimator with HuberRegressor for robust linear regression in sliding window method; added sklearn import |
| requirements.txt | Removed file entirely as dependencies now specified in pyproject.toml |
| pyproject.toml | Added direct dependency specification including new scikit-learn dependency; commented out requirements.txt loading |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # # Use Theil-Sen estimator for robust line fitting | ||
| # result = theilslopes(y_log_win, t_win) | ||
| # slope, intercept = result.slope, result.intercept | ||
| # # Use HuberRegressor which uses L2 regularization and is twice as fast as | ||
| # # Theil-Sen. |
There was a problem hiding this comment.
The comment has a double hash '# #' which appears to be a formatting issue. This should be cleaned up to use a single '#' for consistency with standard Python comment formatting.
| # # Use Theil-Sen estimator for robust line fitting | ||
| # result = theilslopes(y_log_win, t_win) | ||
| # slope, intercept = result.slope, result.intercept |
There was a problem hiding this comment.
The commented-out code for Theil-Sen should be removed rather than left in place. Since this is a deliberate switch to HuberRegressor and the PR description indicates it's roughly twice as fast with similar results, the old code should be deleted to keep the codebase clean.
| # # Use HuberRegressor which uses L2 regularization and is twice as fast as | ||
| # # Theil-Sen. |
There was a problem hiding this comment.
The comment states "uses L2 regularization" but this is inaccurate. HuberRegressor uses L2 regularization as a penalty on the coefficients (controlled by the alpha parameter), but the main feature is that it uses the Huber loss function for robustness to outliers, not L2 regularization. The comment should clarify that HuberRegressor is robust to outliers through the Huber loss function.
| # # Use HuberRegressor which uses L2 regularization and is twice as fast as | |
| # # Theil-Sen. | |
| # # Use HuberRegressor, which is robust to outliers via the Huber loss | |
| # # and is typically faster than Theil-Sen. |
| from logging import getLogger | ||
|
|
||
| import numpy as np | ||
| import sklearn.linear_model |
There was a problem hiding this comment.
The import style import sklearn.linear_model is inconsistent with the rest of the codebase which uses from <package>.<module> import <class> pattern (e.g., from scipy.interpolate import make_smoothing_spline). Consider using from sklearn.linear_model import HuberRegressor for consistency.
| # from scipy.stats import theilslopes | ||
|
|
||
|
|
There was a problem hiding this comment.
The commented-out import from scipy.stats import theilslopes should be removed entirely as it's no longer needed after switching to HuberRegressor.
| # from scipy.stats import theilslopes |
| # use requirements.txt instead of pyproject.toml for dependencies | ||
| # https://stackoverflow.com/a/73600610/9684872 | ||
| # ! uncomment also dependencies in the dynamic section above | ||
| [tool.setuptools.dynamic] | ||
| dependencies = {file = ["requirements.txt"]} | ||
| # [tool.setuptools.dynamic] | ||
| # dependencies = {file = ["requirements.txt"]} |
There was a problem hiding this comment.
The outdated comments from the previous requirements.txt approach should be removed. These comments (lines 25-27, 28-29) reference the old approach and are now misleading since dependencies are now directly defined in the dependencies list.
before (Theil-Sen):
now (HuberRegressor):
The ladder is roughly twice as fast.
Need to fix in a separate PR a regression in plotting.ipynb tutorial